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1.  Introduction 

Wireless  communications  has  slowly  become  the  backbone  of  the  war  fighter  and  it  is  nearly 
impossible  to  coordinate  and  survive  modern  theater  interactions  without  wireless 
communications.  Much  of  modem  day  research  on  military  radio  has  focused  on  adaptability, 
higher  bandwidths,  moving  to  distributed  architecture,  jam  resistance  and  survivability. 

In  this  project,  we  focused  on  the  architectural  and  circuit-level  design  for  a  low  cost,  portable 
soldier  radio  with  high  survivability.  In  particular,  we  focused  on  the  following  goals  for  a  high- 
performance  survival  radio  using  discrete  time  analog  signal  processing: 

•  Small  &  portable  (i.e,  highly  integrated) 

•  Low  power  (potentially  CMOS) 

•  Adaptability  (dynamic  spectrum  access) 

o  Adaptable  carrier  frequency  &  adaptable  signal  bandwidth 

•  Robust  to  jammers 

o  Ability  to  handle  large  out  of  band  jammers 
o  Ability  to  handle  in  band  jammers 

•  Location  information  (may  require  wide  signal  bandwidth) 

•  Always  connected  (highest  priority) 

o  Always  provide  minimum  connectivity  for  voice,  increased  signal  throughput  as 
needed 

We  identified  passive  switched  capacitor  circuits  as  a  suitable  candidate  for  meeting  these 
performance  targets.  In  particular,  passive  switched  capacitors  have  the  following  advantages: 

•  High  speed  of  operations 

•  Very  low  power  (switching  power) 

•  Adaptability  (all  linear  algebra  operations  can  be  performed) 

o  Adaptable  carrier  frequency  (200MHz  -  3GHz) 
o  Adaptable  signal  bandwidth  (5MHz  to  25MHz) 

For  survival  radio  applications,  robustness  to  jammers  is  critical.  For  this  purpose,  it  is  necessary 
to  either  block  the  jammers,  or  have  very  linear  high  dynamic  range  front-end  circuitry.  We  had 
proposed  a  series  of  sine2  and  notch  filters  to  attain  the  required  tone  suppression.  However,  an 
alternate  architecture  that  uses  a  high  dynamic  range  FFT  can  be  used  to  perform  low  amplitude 
signal  detection  in  the  presence  of  large  jammers.  In  the  new  architecture,  the  received  signal 
goes  to  an  LNTA,  the  output  of  which  is  sampled  onto  capacitors.  The  current  domain  sampling 
contributes  a  built-in  anti-alias  filter  which  can  be  used  to  filter  out  the  large  jamming  tones. 
Following  this  filter,  a  sampled  charge  domain  FFT  is  performed.  The  LNTA  design  for  this 
project  has  not  been  completed  yet  but  will  be  incorporated  into  the  FFT  design  at  a  later  time. 

A  very  high  dynamic  range  is  required  in  the  FFT  so  that  it  is  able  to  function  in  the  presence  of 
very  large  jammers.  In  order  to  increase  the  dynamic  range  of  the  sampled  charge  FFT 
architecture,  we  performed  an  analysis  of  the  major  causes  of  non-linearity  and  noise  in  these 
circuits.  These  non-idealities  were  carefully  modeled,  and  used  for  simulations.  We  also 


developed  a  number  of  circuit  techniques  in  order  to  alleviate  these  non-idealities,  and 
significantly  improve  the  dynamic  range  of  the  FFT  processor.  A  detailed  description  of  sampled 
charge  processing,  non-idealities  and  techniques  to  overcome  them  have  been  discussed  below. 


2.  Dynamic  Range  in  Passive  Switched  Capacitors 

2.1.  Sampled  charge  processing 

Signal  sampling  and  variable-rate  analog  signal  processing  is  performed  in  the  charge  domain 
due  to  the  inherent  benefits  of  including  a  built-in  anti-alias  filter  into  the  sampler  [Carl95], 
robustness  to  jitter  [Mirz08],  and  the  ability  to  vary  the  resulting  filter  notches  by  simply  varying 
the  integration  [XuOO,  YuanOO,  Karv06,  Abid07]. 

Many  of  the  benefits  of  the  discrete 
time  FFT  architecture  are  based  on 
the  use  of  passive  discrete-time 
charge  based  computations.  This  is 
best  illustrated  with  the  help  of  an 
example  design.  The  passive 
switched-capacitor  shown  in  Fig.  1 
is  able  to  operate  at  RF  sampling 
speeds  [YuanOO,  Karv06,  Abid07], 

In  this  circuit  the  input  signal  is 
sampled  progressively  in  time  (<J>i  - 
<J>n).  After  A  clock  periods  the 
averaged  output  is  sampled  onto  the 
capacitor  Cs,  which  has  previously 
been  discharged.  The  complete 
circuit  implements  an  A-tap  FIR  filter  that  is  decimated  by  A.  Interestingly,  if  the  capacitor  Cs  is 
not  discharged  between  each  rotation  then  the  circuit  implements  an  A-tap  FIR  filter  combined 
with  a  first-order  HR  filter  that  is  decimated  by  A.  Note  there  is  no  “active”  element  (i.e., 
amplifier)  in  this  circuit.  The  circuit  consists  only  of  switches  and  capacitors,  so  the  maximum 
sampling  rate  is  only  dependent  on  the  RC  settling  times  of  the  switches.  Additionally,  the  only 
power  dissipation,  other  than  that  required  for  sampling  the  signal  from  the  input,  is  due  to  the 
charging  and  discharging  of  transistor  gate  capacitors  in  a  very  digital-like  way.  As  a  result,  a 
variety  of  functions  on  the  sampled  signal  can  be  computed  very  fast  and  using  minimal  power. 

2.2.  Non-idealities  in  passive  switched  capacitor  circuits 

Several  non-idealities  haunt  passive  switched  capacitor  circuits.  The  problem  of  non-idealities  is 
aggravated  by  the  absence  of  a  virtual  ground  node  unlike  in  op-amp  based  switched  capacitor 
circuits.  The  effect  of  sampling  clock  jitter  in  passive  switched  capacitor  circuits  has  been 
analyzed.  We  use  current  mode  sampling  to  make  the  circuits  tolerant  to  sampling  clock  jitter 
[Mirz08], 

Two  important  non-idealities:  clock  feed-through  and  charge  injection,  become  a  nuisance  in  the 
absence  of  a  virtual  ground  node.  Traditional  circuit  techniques  such  as  bottom  plate  sampling 


decimation  by  N 


are  consequently  difficult  to  implement.  Also,  poor  matching  between  nMOS  and  pMOS 
switches,  and  the  reducing  difference  between  Vdd  to  V,/,  in  scaled  technologies  makes  the  use  of 
transmission  gate  switches  less  effective  for  mitigating  these  non-idealities. 

The  noise  in  the  system  is  dominated  by  the  kT/C  noise  of  the  R-C  filter  formed  by  the  switch- 
capacitor  combination.  Moreover,  for  a  multi-stage  switched  capacitor  operation,  the  sampled 
noise  voltages  from  one  stage  recombine  in  the  later  stages.  These  combining  noise  samples  in  a 
particular  stage  are  correlated,  and  therefore,  the  final  noise  becomes  a  complicated  function  of 
the  noise  sampled  at  each  stage  of  the  switched  capacitor  operation. 

The  switch  resistance  (along  with  the  capacitor’s  capacitance)  determines  the  settling  time 
constant.  However,  the  switch  resistance  is  inherently  non-linear  and  input  signal  dependent. 
Consequently,  in  the  case  of  high  speeds  of  operation,  incomplete  settling  can  cause  significant 
signal  dependent  errors  in  computations. 


2.3.  Modeling  of  non-idealities: 

Passive  switched  capacitor  architectures  are  becoming  increasingly  important  with  the  scaling  of 
technology  and  the  newly  recognized  advantages  of  charge  domain  sampling.  Most  of  the  non¬ 
idealities  specific  to  these  circuits  (as  briefly  described  above)  have  not  been  well  modeled.  In 
this  work,  we  have  modeled  these  non-idealities  and  devised  techniques  to  reduce  their  adverse 
effects  on  the  performance  of  these  architectures. 


2.3.1.  Charge  accumulation 

Consider  a  switch  connected  between  two  capacitors  for  charge  sharing.  When  the  switch  turns 
on,  the  charge  required  to  build  the  channel  (charge  accumulation)  is  obtained  from  the  two 
sharing  capacitors.  The  relative  amount  of  charge  from  each  capacitor  depends  on  the 
capacitance  as  well  as  the  charge  on  each  end  of  the  switch.  This  introduces  an  error  in  each 
voltage  at  the  beginning  of  the  operation.  If  the  capacitors  have  sufficient  time  for  settling,  both 
of  them  settle  to  the  same  voltage  independent  of  this  initial  error.  Therefore,  for  sufficient 
settling  time,  the  charge  accumulation  error  can  be  ignored.  However,  if  the  speed  of  operations 
is  so  high  that  complete  settling  is  not  feasible,  a  part  of  this  initial  error  is  retained  on  the 
participating  capacitors,  and  needs  to  be  modeled. 
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Fig.  2:  Modeling  of  charge  accumulation  and  injection  in  switched  capacitor  circuits 

We  modeled  charge  accumulation  by  calculating  the  resistance  seen  by  each  point  in  the  channel 
on  either  side  (Fig.  2).  We  calculated  the  resistance  from  a  point  in  the  channel  to  a  terminal  as 
the  integral  of  the  reciprocal  of  the  long-channel  triode  charge  profile  versus  Vds  from  the  point 


to  the  terminal.  Using  the  resistance  seen  towards  each  terminal,  the  charge  split  ratio  for  each 
point  was  calculated.  We  then  integrated  the  charge  split  over  the  complete  channel  using  the 
instantaneous  Vgs  voltage.  Charge  split  in  the  saturation  region  was  assumed  to  be  1:0,  all  the 
charge  being  provided  by  the  source.  The  channel  charge  fraction  from  the  charge  splits  in  the 
triode  and  saturation  regions  were  then  combined  to  obtain  the  total  charge  split. 

2.3.2.  Charge  injection 

Charge  injection  is  the  channel  charge  dumped  onto  the  participating  capacitors  at  the  end  of  a 
charge  sharing  operation.  Again,  it  depends  on  the  impedance  looking  into  each  end.  However, 
since  this  phenomenon  occurs  at  the  end  of  an  operation,  an  error  is  invariably  introduced.  For 
modeling  the  charge  injection  accurately  for  simulation  purposes,  we  used  a  technique  akin  to 
that  used  for  charge  accumulation  above.  However,  unlike  in  the  case  of  charge  accumulation, 
the  Vds  is  always  very  small,  and  the  transistor  remains  in  the  deep  triode  region  of  operation 
throughout. 

2.3.3.  Clock  feed-through 

During  the  rising  or  falling  edge  of  the  clock,  some  charge  is  injected  into  the  sharing  capacitors 
due  to  the  voltage  divider  action  of  the  parasitic  capacitance  of  the  switch  with  the  sharing 
capacitors.  As  a  result,  clock  feed-through  has  a  similar  effect  as  charge  injection  and 
accumulation  on  the  performance  of  the  circuit.  For  a  transistor  switch,  clock  feed-through  from 
the  rising  edge  affects  the  circuit  only  in  the  case  of  incomplete  settling,  while  that  from  the 
falling  edge  always  affects  the  circuit. 

The  parasitic  capacitance  during  the  clock  transition  was  estimated  and  used  to  accurately  model 
the  total  voltage  error  caused.  This  error  was  taken  into  account  in  the  overall  model  for 
optimization  purposes.  When  clock  feed-through  cancelation  schemes  such  as  half-dummy 
switches  were  used,  mismatch  effects  were  included  in  the  models  to  capture  errors  caused  by 
fabrication  non-idealities. 


2.3.4.  Settling 

The  voltage  settling  in  a  switched  capacitor  operation  is  determined  by  the  R-C  time  constant  of 
the  switch-capacitance  pair.  In  case  the  time  available  for  settling  is  relatively  small  as  compared 
to  the  time  constant,  x  =  RC,  settling  is  incomplete. 

It  is  important  to  note  here  that  the  on-resistance  of  the  switch  is  variable.  As  a  result,  the  time 
constant  of  the  settling  operation  is  time  dependent.  This  needs  to  be  accurately  modeled  in  order 
to  reliably  capture  the  error  caused  by  incomplete  settling. 


Consider  a  share  and  scale  operation  where  two  voltages  VA  and  Vb  on  capacitors  Cy  =  C  and  C2 
=  C  are  shared  with  each  other  and  a  third  stealing  capacitor  Ck  =  0  is  used  for  scaling  their  sum. 


For  a  scaling  factor,  k  = 


the  voltages  after  settling  should  ideally  be  VA  =  VB  =  k 


va+Vb 


2+Cfe/C’ 

However,  in  the  case  of  incomplete  settling,  the  final  voltage  is  given  by: 

Difference  Settling  Scaled  Sum  Settling 


VA(0  = 


-m 


y-'/Tj 


va  +  Vb  ■ 


where, 


id :  difference  term  setting 
ts  :  sum  term  setting 


(ex.  2-point  share/scale  z d  =  RC) 

(ex.  2-point  share/scale  zs  =  RC(l-k)) 


As  can  be  seen  from  the  above  equations,  the  sum  settling  time  constant  is  lower  than  the 
difference  settling  time  constant.  Therefore,  in  the  case  of  incomplete  settling,  a  larger  difference 
settling  error  is  caused. 

2.3.5.  Noise 

Noise  arises  in  the  system  from  the  sampling  as  well  as  the  sharing  operations.  When  a  switch 
turns  on,  its  noise  voltage  appears  on  the  participating  capacitors.  When  the  switch  turns  off,  this 
noise  is  sampled  onto  the  capacitors.  The  resultant  voltage  can  be  modeled  as  a  random  variable 
added  to  the  final  output.  Depending  on  the  operation  performed,  the  noise  voltage  has  an 
expected  rms  value  value  as  follows: 

(l/2)*(kT/C)  for  share 

(1  -k/2)*(/vT/C)  for  share  and  scale  (scaling  factor  k) 


A  larger  capacitor  reduces  the  total  noise  in  the  system.  It  also  helps  reduce  other  non-idealities 
in  the  system  such  as  charge  injection,  accumulation,  etc.  However,  this  comes  at  the  cost  of 
either  lower  speed  (larger  RC  time  constant)  or  larger  power  consumption  (same  RC  time 
constant). 

The  sampled  noise  voltages  are  also  correlated  to  each  other  by  virtue  of  having  the  same  source 
(shared  switch).  This  correlation  is  also  modeled  and  accounted  for  when  calculating  the  total, 
effective  noise  present  at  the  output  of  the  system. 


2.4.  Passive  switched  capacitor  computations  (additions,  multiplications) 

For  performing  any  linear  function,  addition  and  multiplication  operations  need  to  be  performed. 
Note  that  all  passive  switched  capacitor  operations  are  destructive  in  nature.  Therefore,  once  an 
operation  is  performed,  the  input  values  are  lost.  For  performing  multiple  operations  on  a  single 
input,  multiple  copies  of  the  input  need  to  be  maintained. 

We  have  explored  different  techniques  to  perform  these  operations  using  passive  switched 
capacitor  circuits.  We  have  compared  these  techniques  based  on  their  robustness  to  non¬ 
idealities,  ease  of  implementation,  power  consumption,  speed,  etc.  and  selected  suitable 
architectures  for  these  computations. 

2.4.1.  Addition 


Parallel  connection 

Using  passive  switched  capacitors,  addition  may  be  performed  by 
sharing  the  charges  on  two  participating  capacitors  by  connecting 
them  in  parallel  as  shown  in  Fig.  3.  The  result  of  this  operation  is 
the  average  value  (V /  +  V 2) /2  of  the  input  voltages  V )  and  V 2,  which 
is  a  scaled  version  of  their  sum  operation.  Also  note  that  two  copies 
of  the  output  are  obtained  and  these  can  be  used  for  two  independent 
operations  later.  However,  the  operation  inherently  attenuates  the 
output  by  half. 
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Fig.  3:  Addition  using 
parallel  connection 

From  an  implementation  perspective,  use  of  parallel  capacitors  allows  the  sharing  of  one  plate 
(ground  plate)  for  all  the  capacitors.  This  can  greatly  reduce  the  parasitic  capacitance  and 
resistance  of  the  capacitor,  and  the  area  of  the  overall  implementation. 

Series  connection 

An  alternative  technique  is  to  connect  the  capacitors  in  series.  The  result  of  this  operation  is  the 
sum  (V i  +  V2)  of  the  input  voltages  Vj  and  V2.  In  this  scheme,  it  is  possible  to  use  slightly 
delayed  clock  phases  for  the  top  and  bottom  plate  switches  in  order  to  make  the  charge  injection 
independent  of  the  input  voltage  [Guil05].  However,  in  this  latter  technique,  switches  are 
required  both  on  the  top  and  bottom  plate,  thereby  increasing  the  power  consumption  in  this 
circuit.  Also,  the  two  switches  placed  in  series  halves  the  speed  of  this  circuit  for  identical  switch 
sizes.  Moreover,  only  one  output  (which  can  be  used  for  exactly  one  subsequent  operation)  is 
obtained. 

Considering  the  improvements  in  speed,  power,  area,  and  the  availability  of  two  outputs  for 
further  operations,  the  parallel  connection  scheme  was  selected  for  further  analysis. 

2.4.2.  Multiplication 

Charge  stealing 

Multiplication  in  the  charge  domain  can  be  performed  by  scaling  the  voltage  on  a  capacitor  using 
a  share  operation  with  another  known  capacitor  (stealing  capacitor).  The  charge  on  the  stealing 
capacitor  is  not  utilized  later.  The  overall  operation  causes  a  sub-unity  scaling  on  the  original 
value.  The  scaling  factor  for  a  capacitor  of  value  C  and  a  stealing  capacitor  of  value  Ck  is  given 
by  k  =  C/(C+Ck). 

Fig.  4  shows  a  scaling  operation  using  a  stealing  capacitor  of  size  C* 
with  no  initial  voltage  on  it.  After  the  sharing  operation,  the  final 
value  on  the  capacitor  with  initial  value  Vo  becomes  Vo-C/(C+Cic). 

Ck  can  be  chosen  appropriately  to  obtain  a  particular  scaling  factor. 

Note  that  though  this  technique  is  capable  of  performing  both  sub¬ 
unity  scaling  and  multiplication  with  a  known  attenuation,  at  least 
one  of  the  operands  needs  to  be  known  in  advance  for  this 
implementation.  In  case  variable  capacitors  are  utilized,  dynamic 
operands  can  also  be  used. 

PWM 

Another  technique  to  perform  multiplication  using  passive  switched  capacitors  is  to  modulate  the 
turn  on  time  of  the  switch  and  perform  an  incomplete  share  operation  with  a  fixed  stealing 
capacitor.  The  duration  of  the  operation  determines  the  multiplication  factor.  It  is  possible  to 
multiply  two  unknown  operands  using  this  technique.  However,  considering  the  non-linearity  in 
the  resistance  and  the  share  operation,  the  errors  caused  by  this  technique  make  it  unusable. 
However,  the  concept  can  be  used  to  devise  another  PWM  scheme  which  allows  complete 
settling  thereby  making  it  more  reliable. 

In  this  modified  technique,  the  switch  can  be  turned  on  using  a  sequence  of  randomly  placed 
pulses  and  sharing  the  capacitor  charge  using  a  small  stealing  capacitor  for  each  clock  cycle.  The 
stealing  capacitor  is  discharged  at  the  end  of  each  cycle.  Complete  settling  is  allowed  in  each 
cycle.  The  total  number  of  on-pulses  determines  the  amount  of  scaling.  Maximum  scaling  is 
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Fig.  4: 

Addition  using 

parallel  connection 

obtained  when  all  the  clock  cycles  have  on  pulses,  while  no  scaling  is  obtained  when  all  the 
clock  cycles  have  off  pulses. 

Although  this  technique  is  relatively  accurate,  and  is  able  to  handle  dynamic  operands,  it  is  slow 
and  consumes  more  power  than  the  charge  stealing  technique.  Also,  depending  on  the  accuracy 
required,  the  attenuation  is  considerable. 

Current  domain 

If  the  charge  is  converted  to  the  current  domain,  a  single,  variable-duration  pulse  PWM  scheme 
can  be  used  to  perform  multiplication.  Also,  multiplication  would  not  entail  an  inherent 
attenuation.  However,  the  technique  is  very  power  hungry,  and  the  accuracy  of  the 
transconductance  amplifier  that  translates  from  charge  to  current  domain  needs  to  be  very  very 
high. 

We  have  focused  on  the  charge  stealing  concept  for  performing  multiplications  due  to  their  low 
power  characteristics.  For  most  linear  algebra  problems,  multiplication  using  fixed  coefficients  is 
sufficient,  and  this  technique  lends  itself  easily  to  such  applications. 


2.5.  Circuit  techniques  to  overcome  non-idealities 
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A  number  of  circuit  techniques  were 
devised  to  tackle  the  non-linearities 
discussed  above  in  the  addition  and 
multiplication  operations.  The  techniques 
of  parallel  charge  sharing  for  addition, 
and  charge  stealing  for  multiplication, 
have  been  considered  when  devising 
these  techniques. 

2.5.1.  Switching  schemes 

For  reducing  the  attenuation,  different 
sharing  techniques  can  be  used.  2 
capacitors  are  shared  as  shown  in  Fig.  3. 

We  can  combine  a  share  followed  by 
scaling  into  a  single  operation  by 
connecting  3  capacitors  (2  with  input 
samples  and  1  empty)  and  sharing  their  charges.  This  can  be  performed  in  different  ways  using  2 
switches  or  3  switches  as  shown  in  Fig.  5(a,b,c).  Additionally,  it  is  possible  to  reduce  the  kT/C 
noise  contribution  of  the  overall  structure  by  using  3  appropriately  sized  switches  in  the  scheme 
of  Fig.  5(c).  Similarly,  different  schemes  may  be  used  for  sharing  4  and  5  capacitors  for  scaling 
by  complex  factors  of  the  form  ‘c  +  c.f  (see  Section  2.4  for  complex  operand  scaling 
techniques)  as  shown  in  Fig.  5(d,e,f,g,h,i).  While  some  schemes  (Fig.  5(b,d,g))  ensure  symmetry, 
others  (Fig.  5(a,e,h))  offer  faster  computations  for  the  same/lower  number  of  switches  as 
compared  to  Fig.  5(b,d,h).  Some  schemes  (Fig.  5(c,f,i))  provide  both  speed  and  symmetry  at  the 
cost  of  a  larger  power  dissipation.  Also,  different  schemes,  with  their  appropriate  switch  sizes, 
provide  different  trade-offs  with  regard  to  noise  contribution,  charge  injection  error,  clock  feed¬ 
through  error,  etc.  Based  on  the  models  we  have  developed  the  choice  of  specific  switching 
schemes  to  improve  the  noise,  speed  and  non-linearity  of  the  overall  system  was  optimized.  A 


Fig.  5:  Different  switching  schemes  for  use  in  the 
FFT  processing  engine 


summary  of  the  trade-offs  (numbers  are  relative  to  a  single  switch  share  operation  in  Fig.  3)  is 
shown  in  Table  1  below. 


Table  1:  Summary  of  trade-offs  among  different  switching  schemes 

Switching  scheme  used 

Tv  (RC) 

(RC) 

Psw  ] 

PSw(rs+Trf)/N 

Noise  (AT/C) 

2  switch  (Fig.  5(a)) 

1 

1 

2 

2 

l-k/2 

3sw  star  (Fig.  5(b)) 

2 

1 

3 

4.5 

l-k/2 

3sw  triangle  (Fig.  5(c)) 

1 

0.33 

3 

2 

l-k/2 

3sw  noise  optimum  (Fig.  5(c)) 

1 

0.58 

2.37 

1.87 

0.87-k/2 

4  switch  (Fig.  5(d)) 

1 

1 

4 

2 

l-k/4 

Pentagon  (Fig.  5(h)) 

1 

0.5 

5 

1.88 

l-k/4 

10  switch  (Fig.  5(i)) 

1 

0.2 

10 

3 

l-k/4 

lOsw  optimum  (Fig.  5(i)) 

1 

0.45 

5.85 

2.12 
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2.5.2.  Wire  swapping 

As  we  discussed  previously,  there  is  a  clear 
trade-off  between  power  and  speed  for  the 
charge  based  FFT  engine.  However,  it  is  possible 
to  exploit  the  presence  of  multiple  copies,  the 
differential  nature  of  the  design,  and  good 
matching  accuracy  to  speed  up  the  processing  by 
using  a  faster  clock  than  the  design  allows. 

To  understand  this,  let  us  take  an  example,  as 
shown  in  Fig.  6.  The  figure  shows  a  pair  of 
capacitors  ((1+)  and  (2+))  sharing  their  charges 
and  settling  to  a  final  value  over  time.  Another 
pair  of  capacitors  ((1-)  and  (2-))  in  the 
differential  implementation  are  sharing  their 
charges  similarly.  If  the  voltages  are  not  allowed 

to  completely  settle,  the  final  differential  outputs  ((1+)  -  (1-))  and  ((2+)  -  (2-))  will  be  incorrect. 
However,  if  we  swap  one  set  of  voltages  and  instead  compute  the  outputs  as  ((1+)  -  (2-))  and 
((2+)  -  (1-)),  the  differential  outputs  will  still  be  correct. 
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Fig.  6:  Incomplete  settling  for  high  speed 
processing 


Since  the  time  constant  for  differential  settling  is  higher  than  that  for  common  mode  settling, 
incomplete  settling  using  wire  swapping  can  be  very  advantageous. 


2.5.3.  Noise  reduction 

As  discussed  earlier,  the  correlation  between  sampled  noise  copies  can  be  utilized  to  cancel  out  a 
part  of  the  noise  in  operations.  Also,  the  switch  scheme  (number  of  switches,  their  connectivity, 
and  their  relative  sizes)  affect  the  total  noise  sampled  onto  the  participating  capacitors. 
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Fig.  7:  Share  and  scale  scheme  using  three  switches  for  noise  optimization 

For  example,  for  a  share  operation  (voltage  averaging)  between  two  capacitors,  using  three 
switches  instead  of  two  reduces  the  total  noise  sampled  onto  the  capacitors.  This  is  shown  in  Fig. 
7.  The  switches  are  represented  by  their  average  turn  on  resistances.  For  optimal  noise 
performance,  two  switches  need  to  be  equal  in  size  while  the  third  switch  has  a  size  smaller  than 
these.  Plotting  the  total  noise  power  for  different  sizes  of  the  third  switch,  we  find  an  optimal 
switch  size. 


Similar  schemes  have  been  devised  for  the  more  elaborate  sharing  techniques  discussed  earlier. 
The  performance  of  optimal  switching  schemes  has  been  summarized  in  Table  1. 

In  addition  to  this,  the  correlation  among  the  noise  samples  can  be  used  to  reduce  their  effect  by 
appropriately  sharing  capacitors  during  a  later  operation  in  a  sequence.  Methods  to  perform  this 
have  also  been  devised. 


2.5.4.  Variable  selection 

A  number  of  variables  appear  in  the  optimization  problem.  These  variables  affect  the  speed, 
power,  and  accuracy  of  the  passive  switched  capacitor  circuitry  in  complex  ways.  The  design 
entails  multiple  trade-offs,  as  summarized  in  Table  II.  The  operation  speed  (clock  frequency)  of 
the  design  determines  the  RonC  constant  of  the  switch-capacitor  combination.  The  Ron  of  the 
switch  is  again  dependent  on  the  switch  size  ( Wsw,  which  determines  Csw)  and  Vdd- 
Consequently,  for  a  given  sampling  capacitance,  C,  different  combinations  of  Vdd  and  Csw  (Wsw) 
can  be  used  to  provide  a  particular  operation  speed.  Among  these,  we  optimize  and  select  the 
switch  size  and  Vdd  that  provides  the  lowest  power  for  each  stage.  Programmable  level  shifters 


Table  II:  Trade-offs  in  charge  domain  FFT  design 


Variables 

Increase 

Decrease 

C 

SNR,  Charge-injection,  Charge- 
accumulation,  Clock  feed-through 

Power,  Speed 

W 

Speed,  Non-linear  Ron 

Power,  Charge-injection,  Charge 
accumulation,  Clock  feed-through 

Vdd 

Speed,  Non-linear  Ron 

Power 

vsw 

SNR 

Non-linear  Ron 

can  be  utilized  for  the  required  Vdd  scaling. 


There  are  similar  trade-offs  between  the  speed,  noise  and  non-linearity  of  the  system.  Increasing 
the  capacitance  improves  the  noise  floor,  which  for  our  sampled  data  system  is  dominated  by 
kT/C  noise.  It  also  improves  the  robustness  of  the  system  to  charge-injection  and  clock  feed¬ 
through  errors.  However,  a  larger  capacitance  degrades  the  RonC  constant  and  slows  down  the 
system.  Increasing  the  input  voltage  swing  (Vsw)  improves  the  signal-to-noise  ratio.  However, 
this  introduces  significant  non-linearities  in  the  switch  on-resistance.  It  is  possible  to  increase 
Vdd  to  reduce  this  non-linearity  and  improve  the  speed,  but  at  the  cost  of  a  larger  power 
dissipation  (quadratic  dependence). 


2.6.  FFT  computation 


The  DFT  is  mathematically  defined  as: 
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The  constituent  functions  in  this  definition  are 
multiplications  by  sub-unity  twiddle  factors  and 
additions  of  the  results.  While  these  functions, 
especially  the  former,  can  be  expensive  (high 
power)  in  the  digital  domain,  they  can  be 
computed  inexpensively  using  passive  switched 
capacitor  circuits.  We  perform  additions  in  the 
charge  domain  by  sharing  the  charge  on  multiple 
capacitors  as  shown  in  Fig.  8(a)  (an  average  is  a 
scaled  sum),  and  performing  sub-unity  scaling  by 
stealing  away  charge  from  a  capacitor  onto 
another  capacitor  as  shown  in  Fig.  8(b).  Scaling 
with  complex  twiddle  factors  can  be  performed 
using  a  combination  of  the  addition  and  scalar 
multiplication  operations,  as  shown  in  Fig.  8(c). 
Note  that  2  copies  of  each  sample  are  required  for 
complex  multiplications. 
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Fig.  8:  Addition  and  multiplication  using 
passive  switched  capacitor  circuits 

(a)  V  =  (Vi+V2)/2 

(b)  V  =  V„/(C+Ck) 

(c)  Complex  multiplication:  Y  = 
(A+B).(cr  +  j.cO,  A  &  B  are  inputs;  w 
=  (cr  +  Ci)  is  a  complex  twiddle  factor 


The  Fast  Fourier  Transform  (FFT)  algorithm  can  be  used  to  reduce  the  number  of  computations. 
The  input  signal  is  first  sampled  onto  a  set  of  capacitors.  These  can  then  be  used  to  perform  all 
computations  in  the  butterfly  using  the  different  charge  sharing  operations  described  previously. 
All  calculations  are  performed  on  the  original  sampled  charge  using  in-place  computations,  and 
the  original  values  are  therefore  destroyed.  The  operations  being  destructive,  multiple  copies  of 
the  sampled  inputs  are  required  in  order  to  perform  multiple  operations  on  the  same  input 
sample.  Within  a  butterfly,  each  input  is  operated  on  twice  (2  copies),  and  each  operation  is  a 
complex  add  and  multiply  (2  more  copies).  Therefore,  4  copies  of  each  value  are  required  at  all 
stages  in  the  FFT  computation.  Although  the  presence  of  multiple  copies  is  an  overhead  in  this 
technique,  it  becomes  negligible  when  compared  to  the  orders  of  magnitude  improvement  in 
power  savings.  Also,  the  presence  of  multiple  copies  can  be  exploited  to  obtain  other  benefits  as 
discussed  earlier. 


2.7.  Detailed  FFT  modeling  and  results 

Based  on  the  mathematical  models  we  developed  for  the  different  performance  metrics  as  a 
function  of  the  design  variables,  we  designed,  optimized  and  simulated  a  16  point  FFT  engine  in 
Matlab.  Dynamic  range,  speed,  and  power  were  the  major  optimization  parameters  emphasized. 
Results  from  the  example  design  based  on  the  modeling  and  optimization  described  above  are 
shown  in  Fig.  9.  The  modeling  is  based  on  a  65nm  technology  node.  The  red  lines  represent 
results  from  different  runs  with  a  tone  in  a  particular  FFT  bin.  The  average  of  the  resultant 
magnitudes  forms  the  envelope  shown  by  the  yellow  line.  This  is  the  average  dynamic  range  that 
can  be  obtained  from  the  system  (about  65dB,  equivalent  to  1 1  bits  of  digital  resolution).  In  case 
multiple  measurements  can  be  taken,  the  complex  outputs  can  be  averaged  to  reduce  the 
effective  noise  in  the  output.  The  blue  line  represents  an  average  of  256  independent  runs. 
Assuming  that  the  noise  is  no  longer  the  dominant  source  of  error,  the  blue  curve  then  represents 
the  non-linearity  floor  in  the  overall  system  (~90dB,  or  15  bits  of  resolution).  This  dynamic 
range  is  sufficient  to  block  out  jammers  in  a  typical  war  theater  for  survival  radio  applications. 

The  FFT  engine  is  able  to  operate  at  a  speed  of  5GS/s  with  a  2GHz  input  signal  bandwidth.  The 
total  simulated  power  consumed  by  the  FFT  processor  is  0.5mW  for  sampling  and  0.5mW  for 
processing  and  0.5mW  for  sampling. 


3.  Conclusions 

Sampled  charge  passive  switched  capacitor  circuits  were  identified  as  suitable  candidates  for 
survival  radio  applications.  High  speed,  power,  and  dynamic  range  were  recognized  as  critical 
parameters,  and  a  detailed  analysis  was  performed  to  evaluate  the  effect  of  the  circuit  variables 
on  these  parameters  based  on  a  65nm  CMOS  technology  node.  Based  on  the  analysis,  models 
were  developed,  and  an  optimization  framework  was  designed  to  optimize  these  parameters.  We 
were  able  to  design  and  simulate  a  16  point  FFT  processor  that  is  able  to  process  a  2GHz 


wideband  input  signal  using  only  lmW  total  power  with  a  dynamic  range  exceeding  65dB 
without  noise  averaging  and  90dB  with  noise  averaging.  This  dynamic  range  is  sufficient  for 
tackling  very  large  jammers  in  a  typical  wireless  war  theater  communications  scenario  with 
friendly  and  adversarial  jammers,  multipath  fading  and  difficult  NLOS  operation. 
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